Disfluency and Out-Of-Vocabulary Word Processing in Arabic Speech Understanding
نویسندگان
چکیده
The disfluencies inherent in spontaneous speaking and out-of-vocabulary words omnipresent in any transcribed oral utterance by speech recognition, are a real challenge for speech understanding systems. Thus, we propose in this paper, a method for processing disfluencies and out-ofvocabulary words in the context of automatic Arabic speech understanding. Our method based on a robust and partial analysis of Arabic oral utterances (conceptual segments analysis) is effective for the treatment of such phenomena. This method has been tested through the understanding module of SARF system, an interactive vocal server for Tunisian railway information.
منابع مشابه
Arabic Language Modeling with Stem-derived Morphemes for Automatic Speech Recognition
The goal of this dissertation is to introduce a method for deriving morphemes from Arabic words using stem patterns, a feature of Arabic morphology. The motivations are three-fold: modeling with morphemes rather than words should help address the out-ofvocabulary problem; working with stem patterns should prove to be a cross-dialectally valid method for deriving morphemes using a small amount o...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملThe Impact of Teachers' Training on the Reliability of Tests and Assessments in Governmental and Non-governmental Sections
Assessment is considered as one of the fundamental elements in the field of foreign language acquisition. In order for communication take place, adequate number of vocabulary is needed to be known by the learners. The salient role of vocabulary in the field of foreign language acquisition resulted in the publication of several hundreds of papers and dozens of books. Due to the dominant role of ...
متن کاملTight Integration of Speech Disfluency Removal into SMT
Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disfluen...
متن کاملAnalysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages
We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four “morphologically rich” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) r...
متن کامل